A Morphological Analyzer for Japanese Nouns, Verbs and Adjectives
نویسنده
چکیده
We present an open source morphological analyzer for Japanese nouns, verbs and adjectives. The system builds upon the morphological analyzing capabilities of MeCab [Matsumoto et al., 1999] to incorporate finer details of classification such as politeness, tense, mood and voice attributes. We implemented our analyzer in the form of a finite state transducer using the open source finite state compiler FOMA toolkit [Hulden, 2009]. 1 Basic Information about Japanese The Japanese language is spoken by more than 100 million speakers (mostly in Japan). It is an agglutinative language with a SOV word order. The Japanese writing system makes extensive use of Chinese characters, also known as kanji , along with scripts, hiragana and katakana, which are syllabic. There are 3 main lexical classes in Japanese that exhibit morphology, they are the nouns, verbs and adjectives. In Japanese, adverbs are often adjectives suffixed with a special morpheme. As such, it is often not considered a separate class of words. In §4, we will discuss the phenomenon for each of these word classes. 2 Past Work on the Japanese morphology In traditional Japanese morphological analysis, a lexicon is assumed. The lexicon is a list of pairs of a word and its corresponding part-of-speech. As Japanese is an unsegmented language, past work on Japanese language morphology analyzers require the use of a segmenter. Often, the segmentation step is conducted jointly with the morphological analysis step in a rule based [Matsumoto et al., 1991] or machine learning framework [Kudo et al., 2004, Matsumoto et al., 1999]. The current state of the art Japanese language part of speech and morphological analyzer is MeCab [Matsumoto et al., 1999], which is an extended from ChaSen [Matsumoto et al., 1991], but using CRFs[Lafferty et al., 2001] instead of HMMs to model the morpheme sequences. ∗This morphological analyzer is done as part of the project requirements for the Spring 2013 NLP Lab (11-712) at Carnegie Mellon University (http://www.cs.cmu.edu/\protect\unhbox\voidb@x\penalty\@M\{}nasmith/NLPLab/). The source code and tool is available at https://bitbucket.org/skylander/yc-nlplab/ . http://mecab.googlecode.com/svn/trunk/mecab/doc/index.html http://chasen-legacy.sourceforge.jp/
منابع مشابه
Procedures and Problems in Korean-Chinese-Japanese Wordnet with Shared Semantic Hierarchy
This paper introduces a Korean-Chinese-Japanese wordnet for nouns, verbs and adjectives. This wordnet is constructed based on a hierarchy of shared semantic categories originated from NTT Goidaikei (Hierarchical Lexical System). The Korean wordnet has been constructed by mapping a semantic category to each Korean word sense in a way that maps the same semantic hierarchy to the meanings of nouns...
متن کاملSemantic Classification of Chinese Unknown Words
This paper describes a classifier that assigns semantic thesaurus categories to unknown Chinese words (words not already in the CiLin thesaurus and the Chinese Electronic Dictionary, but in the Sinica Corpus). The focus of the paper differs in two ways from previous research in this particular area. Prior research in Chinese unknown words mostly focused on proper nouns (Lee 1993, Lee, Lee and C...
متن کاملA Rule-based Morphological Analyzer for Murrinh-Patha
Resource development mainly focuses on well-described languages with a large amount of speakers. However, smaller languages may also profit from language resources which can then be used in applications such as electronic dictionaries or computer-assisted language learning materials. The development of resources for such languages may face various challenges. Often, not enough data is available...
متن کاملMorphological Analysis and Generation of Arabic Nouns: A Morphemic Functional Approach
MAGEAD is a morphological analyzer and generator for Modern Standard Arabic (MSA) and its dialects. We introduced MAGEAD in previous work with an implementation of MSA and Levantine Arabic verbs. In this paper, we port that system to MSA nominals (nouns and adjectives), which are far more complex to model than verbs. Our system is a functional morphological analyzer and generator, i.e., it anal...
متن کاملAdjectival versus Nominal categorization processes: The Rule versus Similarity hypothesis
This paper presents the thesis that adjectives and nouns trigger processing by two different cognitive systems, those reported in the literature to be involved in the processing of rulebased versus similarity-based artificially construed categories, respectively. To support this thesis, the paper illuminates a number of links between findings reported in the literature concerning ruleversus sim...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1410.0291 شماره
صفحات -
تاریخ انتشار 2014